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Abstract —We propose a method for estimating channel pa¬ 
rameters from RSSI measurements and the lost packet count, 
which can work in the presence of losses due to both inter¬ 
ference and signal attenuation below the noise floor. This is 
especially important in the wireless networks, such as vehicular, 
where propagation model changes with the density of nodes. 
The method is based on Stochastic Expectation Maximization, 
where the received data is modeled as a mixture of distribu¬ 
tions (no/low interference and strong interference), incomplete 
(censored) due to packet losses. The PDFs in the mixture are 
Gamma, according to the commonly accepted model for wireless 
signal and interference power. This approach leverages the loss 
count as additional information, hence outperforming maximum 
likelihood estimation, which does not use this information (ML-), 
for a small number of received RSSI samples. Hence, it allows 
inexpensive on-line channel estimation from ad-hoc collected 
data. The method also outperforms ML- on uncensored data 
mixtures, as ML- assumes that samples are from a single-mode 
PDF. 1 


I. Introduction 

For various reasons (such as participatory RF sensing in 
order to develop low-cost RF maps |[T|, or for calibrating 
the channel in order to reproduce field trials in a simulator), 
wireless systems often collect signal strength data on the fly, 
i.e., in the course of actual operation. Such data is often 
collected in the form of paired values of Tx-Rx distance and 
the received signal strength indication (RSSI), which can be 
thought of (within a known additive constant) as the received 
power in dBm RSSI is measured on a per-packet basis. 
If there is too much noise and/or interference for a given 
measurement, the packet can be lost in which case only the 
failure indication is recorded (indirectly, e.g., through packet 
sequencing). The data reduction challenge is to reconstruct, 
from the collection of recorded RSSI values and packets 
tagged as lost, the probability density function (PDF) of the 
received signal. With the PDF thus estimated, the analyst can 
accurately model the propagation in the environment (e.g., 
path loss vs. distance), and also model interference effects 
for a given scenario (e.g., geometry, spatial density of both 
active and inactive Tx-s, etc.) The widespread adoption of 
Nakagami PDFs for modeling radio links is justified by the 
abundant analysis of empirical data 0 , 0 When we refer 
to the Nakagami PDF, it implies the signal amplitude; the 
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corresponding power is Gamma-distributed, with the same 
scale parameter m and shape parameter ft, and the dB power 
(hence, RSSI) can be thought of as log-Gamma. Note that 
Nakagami with m=l corresponds to the Rayleigh distribution. 

However, the problem of estimating parameters of this 
PDF based on packet data collected over time periods of 
practical interest (the shorter the better) remains challenging. 
The reason is a high amount of lost (censored) samples caused 
by interference and low SNR due to fading or distance- 
based attenuation. As interference is intermittent, there are 
two broad classes of RSSI data points, namely, those with 
no (or low) interference, and those with enough interference 
to result in a significantly modified statistical model (different 
PDF). Note that maximum-likelihood (ML) Q, the typically 
best approach for single statistical model, does not offer a 
closed form solution for data mixtures with loss counts. To 
derive parameters of PDFs featured in a censored mixture of 
two random variables (RVs), representing samples with no/ 
low interference, and with strong interference, we propose 
the use of Stochastic Expectation-Maximization (SEM) 
estimators. In addition, our approach leverages the loss count 
as additional information to improve the estimation accuracy 
for a given number of samples. We introduce notation ML- to 
denote the ML that utilizes a single-mode PDF assumption 
and only received samples. In this paper, we demonstrate 
that our approach performs better than ML- in the presence 
of interference, because it starts with an assumption of two 
components (dual mixture) and because it uses the loss count 
as side information. It also outperforms ML- in cases without 
interference, if the number of received samples is small, which 
is frequently the case in on-line estimation tasks. 

The organization of the paper is as follows: in Section 
we briefiy describe the system model based on an example, 
while in Section III we introduce basic algorithmic elements; 
in Section W we present the algorithm used in our approach; 
we evaluate our model on both simulated and empirical data, 
and discuss the results in Section El In the last section we 
conclude and address future work. 


11. System Model and Motivational Example 

We refer to no/ low interference samples as signal (or 1st) 
component, and to strong interference samples as interference 
(or 2nd) component. We propose to have both signal and 
interference components in the mixture modeled by the same 





Fig. 1. Time plots from [8], showing the effect of an increasing number of 
active Tx-s to the RSSI of a single mobile link, with Tx at a constant distance 
from Rx 


family of PDFs, i.e., Gamma. Properly parameterized Gamma 
PDFs {GPDFs) are widely used to model small-scale fading, 
to approximate the product of the small-scale and lognormal 
fading distribution, and to approximate the interference power 
0. Our claim that interference samples deserve to be modeled 
by a 2nd component is evident in Fig j^, where the 
distortion caused by interference increases with the spatial 
density of interferers. The field trial in which the samples 
were collected included 200 (moving) vehicles equipped with 
wireless modems, where the test first ran with 100 active 
transmitters; then another 50 were added, and in the final 
third of the test all 200 modems were transmitting (3 parts 
delineated in Fig[T]). Note that these and other RSSI measure¬ 
ments featured here are made on OFDM transmissions with 
a lOMHz bandwidth centered near 5.9 GHz, in compliance 
with V2V DSRC IEEE802.11p, using Atheros 802.lip chips. 
It appears that fading is increased as more Tx-s are activated 
in the field, although the propagation environment has not 
changed, due to constant density of vehicles. This is the effect 
of random phases of the interferers; the sum of the M random 
phasors with equal amplitudes approaches Rayleigh as M 
grows. Hence, as the interference increases, m in Gamma (and 
Nakagami, for amplitudes) should approach 1. In this case, 
the dB peak power (as in Fig[T]) is limited to be lOloglO(M) 
above the average power, but the dB power swings below the 
average can be huge, because of the phasor-sum reductions. In 
the Rayleigh limit (which M = 10 roughly approximates), the 
probability of being 10 dB or more below the average is about 
10%, while the prob. of 10 dB or more above the average is 
0 . 

For this reason, we would model the 2nd component in the 
mixture with a GPDF of the scale parameter m initially set to 
one, while the 1st component (pure signal) is modeled with 
a different m, initially set according to some side information 
about the data origin (mobile, static, indoor, outdoor, rural, 
urban etc). Starting with these and other initial values, the 
SEM algorithm should eventually converge to parameters that 
better characterize both the signal and the interference as func¬ 
tions of the distance from the signal Tx. In each RSSI mixture 
component, there are two sub-classes: received (uncensored) 
data, and lost (censored) data. For the no/low interference 
case, the censored data are mostly at large distances where the 
median Rx power is attenuated at or below the noise threshold. 


The Rx power can also go below the noise floor at any distance 
as a result of deep fades, due to multi-path. Per Fig the 
interference causes similar fading on RSSI samples, possibly 
more intense, causing more losses. 

The stochastic EM algorithm is a known approach for 
computing ML estimates in the mixture problem. Our model is 
derived from an extension of the SEM algorithm 0, dubbed 
SEMcm, in a particular case of incomplete data where the 
information loss is due to both mixture of distributions and 
censored observations. We aim to estimate the parameters of 
a left-censored dual mixture, which we propose as a model 
of observed wireless RSSI samples with countable losses, 
following 


III. Basic Algorithmic Elements 

A mixture of 2 distributions of the same family p(^|0i), i = 
1,2, is defined by 

Pviy) = + 0i2P{y\<l>2)- ( 1 ) 

Here, y is the RV modeling an arbitrary mixture sample. ai 
is the mixing probability. Equivalently, 

PviV: z) = p{y\‘Pz) = U, (y) (2) 

is the joint distribution of the RVs Y and Z, where Z is the 
indicator RV modeling the association with one of the two 
mixture components (with probability ai), and the subscript 
represents the PDF parameters that we aim to estimate: 

if = ((Ti, 0^2, 01,02) , 0 < 0^1 < 1, 0^1 = 1 - 0^2. (3) 


We propose that de-logged RSSI values be modeled by a dual 
mixture of GPDFs p(^|0i), i = 1, 2. Hence, we have 

This model is also depicted in plate notation in Fig. (a). 

Next, we introduce censoring: let ^ G i? where R is 
partitioned into disjoint domains R = Ro^Ri^ where Rq is the 
subset of uncensored data, wile Ri is the subset corresponding 
to left-censored data, i.e., y < cl where cl denotes left 
threshold. Let us assume that there are n samples total (e.g., 
n transmitted packets), Tq of which are uncensored (received 
packets): yk = Xk ^ Ro,k G Co,\Co\ = To, and ri left 
censored (lost) samples: yk G Ri^k G Ci, |Ci| = ri, where 
To + ri = n. Note that Co and Ci are disjoint sets of sample 
indices (e.g., packet sequence numbers SNs), x/c is measured 
while yk is te real value (which are not equal for censored 
samples). In our model, total number od samples and losses 
could be obtained by tracking SNs of received packets. We 
define 


Tif^\xk) = E\Zk,i\y = Xk,i>^^^ 


a^^''pi.Xk\(t>f'’) 

(5) 


where i = 1,2, fc G Co, denoting current 

estimate of the probability that uncensored sample Xk belong 




























Fig. 2. Plate models for (a) uncensured dual mixture of Gamma components 
(b) censured dual mixture of Gamma components; Shaded circles represent 
observables. 


to component i; and 


= E 




Ir, p(yl4^^)dy 

In, p(yi4^^)dy’ 
( 6 ) 


where i = 1,2, denoting current estimate of the 

probability that a left-censored sample belong to component 
i. The current estimate refers to the (p + l)th iteration of the 
SEMcm algorithm (described in the next subsection). Observe 
that we have 2 classes of binary latent variables in 0 and 
0, for k e Co and k e Ci, respectively. The 1st includes Vq 
indicators characterized by prob. of success T^^^^\xk) 
(prob. of the 1st component), with = 1 — the 2nd 
class has a single RV Zil indicating the 1st component w.p. 

with ZiL = 1 — Z 2 l- The censored model is also 
depicted in plate notation in Fig. @(b). 

IV. SEM-based Channel Estimation Algorithm 


Given samples of RSSI, and loss counts for different dis¬ 
tances d between a Tx and an Rx, the goal is now to obtain 
ai and the two PDEs, p(p|0i), for i = 1 (signal component) 
and i = 2 (interference component), as a function of distance 
d. We refer to all lost samples as left-censored, as the noise 
floor is on the left side of the support set of both components, 
and to the noise floor as the left threshold c^. Let us first 
revisit the EM algorithm for mixture data without censoring. 
We have samples p, but we are missing the indicator RVs ^ 
in 0- The EM algorithm replaces the maximization of the 
unknown logp^{y,z) by iterative maximizations of the log- 
likelihood expectation, conditionally to the observed sample 
X, and for the current value of the parameter (p |T0| . 

To calculate Q{p, = E [\ogp^{y^ z)\y = x, p^P^^ we 
must derive the current conditional density of {y,z) given y = 

X, 


h{y-,Ay-,^^'^^) 


Pip(p) {y, z) 
U(p){y) 


(7) 


Iteration p-\-l has 2 steps: 

E-step: Compute h{y, z\y, p^P^) (hence Q{p,p^P^)) 

M-step: Choose p^P~^^^ = argmax^^^ Q{p,p^P^). 

Now, the stochastic EM (SEM) was introduced to over¬ 
come the numerical limitations of EM. Eor the current value 
p^P^ of the parameter, it completes the observed samples by 


replacing each missing data by a value drawn at random from 
h{y^ z\y^p^P^) (S-step), and then computes the ML estimate 
based on the completed sample (M-step). We first define the 
three steps for the left-censored dual-mixture in general, and 
then present the specific expressions for GPDE. 

E-step: Compute T^^^^\xk) for k e Co, i = 1,2 
Compute for i = 1, 2 

S-step: (1) Eor Xk G Ro,k Co simulate Vo binary 
vectors zi 


(p+i) _ 




experiments w.p. 


"/cl 
(p+1) 


’ ^/c2 


by running Bernoulli 


; (2) simulate ri binary vectors 


^(p+i) 


^b+1) ^(p+i) 

^Lil ’ ^Li2 


, i = 1, 


, ri, each as a Bernoulli 


experiment w.p. Tl\ ^; ( 3 ) simulate ri missing left censored 
values sampling from h{'\cL^p^P^) 


Ppp) 


/fii fpv) iy)dy' 
M-step: Choose p^P^^^ = argmax(5((^, 


( 8 ) 


where 

= E! I y] 

i=i \keCo 


^(p+i) 

“'ki 


ri 


Jp+1) 


log aE 


i=i \keCo 


E ^Li,j 

i=i 

EEC 


i=i 


(9) 


We next evaluate Q{p,p^P^) for GPDEs, resulting in the 
proposed channel estimation algorithm, dubbed SEMcmG: 

E-step: as in 0 -E, based on 0 
S-step: as in 0 -s, do (l)-( 3 ), based on 0 
M-step: Based on 0 and 0 solve 


i. = 0 ^ »!"+■> - 


11 . 


dai 

dQi<pPp) 

dQi 




^(P+1) 

^ki 


= 0 


^ ^ ob+i) 

^(p+l) _ ^^im 


■E^ 

i=i 


Sp+E 


V ppxi) XP^ 

(p+l) _ 2-^k^Co '^ki I Z^jf=l '^LiJ ^L,j 

(p+1) 


m- 

ri ^(p+1) (p+1) 




na] 


iii. 

drrij 


T^(x) = 


r{x) 


^{x) « logo: - T _ = log ^ 

P+1 


rP+1 _ 
-^iA — 


.^(p+1) rP+1 I 

l^keCo ^ki ^i,Xk ^ 2^j=l ^Li,j 


p+1 


na 


(p+i) 


( 10 ) 


Solve - pp = 0. 


Note that we are frequently averaging over the expected 
number of samples. Total number of samples and losses could 
be obtained by tracking sequence numbers of received packets. 





























































Fig. 3. The making of of left-censored 2-mixture representing a mobile Rx 
signals (PL) with interference and attenuation 


V. Evaluation 

A. Model Evaluation on Simulated Data 

Besides evaluating SEMcm algorithm on some trivial data 
sets (one component with left-censoring (H); one doubly- 
censored component), we successfully evaluated SEMcmG on 
a simulated mixture of two left-censored components, which 
was meant to emulate interference affected RSSI samples. 
The first component represents the signal over a distance 
range identical to the range considered in the empirical data 

evaluation: /^ = 23-32, where Id is the log-distance, defined 

as 10 log 10 (distance in m). The second component models 
RSSI samples with strong interference over the same distance 
range. We simulated different parameters, mostly with the 
interference component having m=l (i.e., m 2 ~ 1), following 
our discussion in Section |I] The results are encouraging. 
However, we now present a mixture with arbitrary parameters, 
chosen to create a signal cloud visually distinguishable from 
the interference cloud in the mixture scatter-plot (bottom left 
pane in Eig. [^, while capable of exemplifying main concerns 
about censored RSSI mixtures. The mi is chosen slightly high 
for the assumed mobile signal (mi = 7), while m 2 = 35; 
such a high value of m 2 may represent a single (or dominant) 
interferer. 

Signal attenuation over space is exponential, with the 
attenuation coefficient to be determined through parameter 
estimation. We choose to present the exponential attenuation 
in dB domain as a linear function of Id. Hence, as in our prior 
work median path-loss [PL] is fitted by the straight-line 
function 

[PL] = A-Bid. (11) 

Note that PL is defined as PL = RSSI —10 log io{Pt), where 
Pt is the Tx power. Hence, it is distributed as log-Gamma. We 
present data points in some of our plots as PL rather than RSSI, 
as it refiects the propagation medium only (independent of Tx 
power). The simulated Ll was chosen so that the linear fit into 
the dBm value of the Gamma mean (i.e, 10 log lo(f^m)) vs. 
Tx-Rx distance be equal to ( pTj ) with A=-16, B=3. Note that ft 
is a function Lt{ld). With these values, the signal only scatter 
plot (in dB) is presented in the upper-left corner of Eig. As 
for interference, for simplicity and without loss of generality, 
we propose that the median interference is constant over space. 



Fig. 4. Estimated mean (green) is almost identical to the real one for 
most distances (except for cluster-overlap bins), so that its linear fit (cyan) is 
covering the black line (real mean from the bottom-right plot of Fig. [^. 



distance m 


Fig. 5. m parameters SEM estimate diverges from the real mean in cluster- 
overlap bins (as do ML and MB). For other bins, ML and MB take the 
interference as part of the signal and estimate higher fading (m below 1). 

e.g., assuming one distant interferer. Such interference points 
(dBm) are shown in the upper-right plot in Eig. 

Notice that for both components we generated points for 
discrete values of Id, referred to as distance bins, with 0.5 
dB space in between. Eor each bin we generated 1000 signal 
(or interference) points, referred to bin arrays. Then, for each 
bin, and each bins sample index (1-1000) we would select 
with probability either signal or interference point in that 
place, making a balanced mixture of the two components, and 
ending up with 1000 points per bin (bottom-left in Eig.[^. The 
choice of the mixing coefficient that gives equal weight to 
both components is deliberate, as such mixtures were hardest 
to separate. Einally, we censor (drop) the points that are below 
the threshold cL = -109 (indicated in bottom plots of Eig. 
3), resulting in a set of points in the bottom-right plot of 
Eig. These are the points fed into SEMcmG, along with the 
initial values of the parameters, and the information of how 
many samples per bin were censored. The initial values were 
distorted with respect to the real values up to 50 Observe in 
the bottom-right plot of Eig. the red line that was obtained 
as a Least Square Error (LSE) estimate of the mean of the 
censored mixture, as opposed to the black line that represents 
the real mean of the signal. This illustrates how much the 
assumption of one component (as in the presented LSE) can 
cost in terms of estimation error. With SEM, the estimates (per 
bin) were perfect for most simulated mixtures if the data losses 
constituted less than 60-70% of data, while for higher losses 
they were just better than ML estimates. Eor this particular 
mixture, losses were up to 45% (Eig. [^, in order to highlight 
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Fig. 6. Parameter estimates convergence over 30 iterations to known real 
values (bin 10): - upper plot, m estimate - bottom. 


the cluster overlap problem, i.e. the distance bins where 
the median values of the components were indistinguishable. 
Please observe the green line in Fig. which illustrates the 
signals mean estimate. Note that only in the area around 
Id = ‘27 (cluster overlap) does SEM diverge from the real 
mean, while following the LSE mean estimate, and in the same 
area the interference mean estimate follows that of the real 
signal. We are looking into additional mechanisms to address 
this phenomenon. 

Fig. shows the m estimate per bin. Again, as the likelihood 
equations are intractable for any maximum likelihood estimate, 
we compare our results for the m parameter with good existing 
approximations. The ML and moment-based (MB) estimates 
in Fig. are calculated based on the r received samples. The 
former one is obtained according to the following maximum 
likelihood approximation 


m 


ML 


6 + V36 + 48A 
24A 


0, where 


A = In 


r 






i=l 


and Pi is the Rx power sample (de-logged RSSI). The latter, 
follows eqn. (10) from |T^ , which is based on the first 
two sample moments of the received power pi. The ML and 
MB estimates never outperform the SEM estimate, even not in 
the cluster overlap area (Fig. [^. In fact, outside the overlap, 
ML and MB are producing huge errors, as they assume 
one Gamma distribution, and, hence, they are interpreting 
the wide clouds outside of central area as a sign of deep 
fades; consequently, m is estimated too low (around 1). This 
is a very important argument for the proposed approach, as 
interference clearly cannot be accounted for by any single¬ 
component model. A feature of interest for on-line estimation 
is the convergence rate. We illustrate it in Fig. for both ft and 
m and a given bin: a step-by-step evolution of the estimated 
parameter. It seems that both estimates could have been better 
if we ran some additional iterations. 
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Fig. 7. Both mean (10 log (Clm)) in the top plot, and m, in the middle, 
follow ML estimates when losses are j 60% (bottom). 


B. Model Evaluation on Empirical Data 

The ZR trial, described in 0 , included only one Tx at 
a time, mounted on a vehicle that traveled back and forth 
from the static Rx-s on a straight road dmax=^2CtCt m long. 
This scenario with no interference helped us to study the 
performance of our SEMcm algorithms in terms of signal 
component estimates, when the initial values for the (non¬ 
existent) interference were arbitrary. As the Tx was mobile, 
suggesting Gamma distribution, SEMcm with Gaussian PDFs 
gave bad estimates (as expected) and numerical instabilities. 
SEMcmG showed good results. The initial values for the signal 
parameters were taken from imperfect estimates, based on the 
linear LSE fit into a pathloss function that was linear only 
beyond a break point, and also due to noise-floor saturation 
(Fig. [T] upper pane). 

For simplicity we performed SEMcmG only for the distance 
bins after the break-point (2nd segment), as the smaller dis¬ 
tances involved the two-ray phenomenon. The linear fit of the 
initial Clra in dBm, represented by the yellow line in Fig. 
matches O with coefficients A2 and B2. Other coefficients, 
based on the LSE over the 2nd segment only, came closer to 
the real median PL (known from running the same field trial 
with higher Tx power, which avoids the noise floor within 
traversed distances). 

The SEMcm estimated line (red line with circular markers) 
is almost the same as the real one. The initial value for m 
was 1.5 (bottom black line in the middle pane of Fig. [?]), yet 
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Fig. 8. log-Gamma component PDFs (/i and / 2 ), based on the SEMcm 
estimates, given data from the 3rd part in Fig. 1 for a given distance bin. 
Mixing coefficients are found to be 0.1 and 0.9: mixture PDF with these cxi 
is shown as insert, along with the RSSI histogram of that distance bin. The 
arrows point to the similarity of the estimated PDF shapes and empirical data. 


SEMcmG managed to improve it to 3.3 on average, which 
is identical to its ML estimate. Now, the ML estimate works 
optimally when there is sufficient number of samples, which 
was the case here. The bottom pane of Lig.[7]shows the number 
of transmitted packets in black, and the number of lost packets 
in red. The last bin has the worst losses (75%), yet, more than 
500 packets received is sufficient for ML. 

In conclusion, without interference, SEMcm outperforms 
the LSE approach in estimating the mean (10 log lo(f^m)), 
while it is comparable to ML in estimating m. Linally, we 
present Lig. which is based on the data featured in Lig 1. 
Apart from show-casing the notion of dual mixture and cen¬ 
soring, this figure affirms the censored mixture approach, as it 
illustrates a good match between the SEM-reconstructed PDL 
of the data featured in Lig 1, and its empirical distribution. 
Observe that the points left of the black vertical line around 
— llbdBm represent censored samples (i.e, cp = —115) . 

VI. Conclusion 

Our main contribution is a novel model of interference 
affected RSSI samples, presented as censored mixture of 
Gamma PDLs, based on the insight from data collected for 
varying interference levels (see Pig. 1). Also, we applied 
a flavor of EM algorithm which not only mechanizies the 
computation of the parameters’ ML estimates for our complex 
statistical model of incomplete non-Gaussian mixed data (ID 
(ID> but also utilizes stochastic randomization to avoid strong 
dependence on its starting position, convergence to a saddle 
point, and low convergence rate. A great property of this 
method is that it leverages the count of lost data, to improve 
estimates for small number of samples, which is especially 
important for online estimation based on crowd-sourced data. 

Our future work will explore online versions of EM algo¬ 
rithms GD applied to our problem. Also, future work will 
address improvements for signal levels that are on average 
too close to interference levels, such as in cluster-overlap bins 
in Pigures and Although this is a common problem in 
data clustering, we believe that good predictive models for 
cluster overlaps could be developed based on signal samples 










